4 research outputs found

    Geometrical-based lip-reading using template probabilistic multi-dimension dynamic time warping

    Get PDF
    By identifying lip movements and characterizing their associations with speech sounds, the performance of speech recognition systems can be improved, particularly when operating in noisy environments. In this paper, we present a geometrical-based automatic lip reading system that extracts the lip region from images using conventional techniques, but the contour itself is extracted using a novel application of a combination of border following and convex hull approaches. Classification is carried out using an enhanced dynamic time warping technique that has the ability to operate in multiple dimensions and a template probability technique that is able to compensate for differences in the way words are uttered in the training set. The performance of the new system has been assessed in recognition of the English digits 0 to 9 as available in the CUAVE database. The experimental results obtained from the new approach compared favorably with those of existing lip reading approaches, achieving a word recognition accuracy of up to 71% with the visual information being obtained from estimates of lip height, width and their ratio

    Investigation of dimensionality reduction in a finger vein verification system

    Get PDF
    Popular methods of protecting access such as Personal Identification Numbers and smart cards are subject to security risks that result from accidental loss or being stolen. Risk can be reduced by adopting direct methods that identify the person and these are generally biometric methods, such as iris, face, voice and fingerprint recognition approaches. In this paper, a finger vein recognition method has been implemented in which the effect on performance has of using principal components analysis has been investigated. The data were obtained from the finger-vein database SDMULA-HMT and the images underwent contrast-limited adaptive histogram equalization and noise filtering for contrast improvement. The vein pattern was extracted using repeated line tracking and dimensionality reduction using principal components analysis to generate the feature vector. A ‘speeded-up robust features’ algorithm was used to determine the key points of interest and the Euclidean Distance was used to estimate similarity between database images. The results show that the use of a suitable number of principal components can improve the accuracy and reduce the computational overhead of the verification system

    Image fusion based multi resolution and frequency partition discrete cosine transform for palm vein recognition

    Get PDF
    The rapid growth of technology has increased the demand for automated security systems. Due to the accessibility of the palm region and the unique characteristics of each individual's palm vein features, such biometrics have been receiving particular attention. In the published research relating to palm vein biometrics, usually only a single image is used to supply the data for recognition purposes. Previous experimental work has demonstrated that the fusion of multiple images is able to provide richer feature information resulting in an improved classification performance. However, although most of the image fusion techniques are able to preserve the vein pattern, the fused image is often blurred, the colors are distorted and the spatial resolution reduced. In this paper, the multi-resolution discrete cosine transform (MRDCT) and frequency partition DCT (FPDCT) image fusion are applied and are able to extract the finer details of vein patterns while reducing the presence of noise in the image. The performance shows that the use of MRDCT and FPDCT was able to improve recognition rate compared to using a single image. The equal error rate improvement is also significant, falling to 9% in 700nm image, 7% in 850nm image and 6% in 940nm image

    Feature-fusion based audio-visual speech recognition using lip geometry features in noisy enviroment

    No full text
    Humans are often able to compensate for noise degradation and uncertainty in speech information by augmenting the received audio with visual information. Such bimodal perception generates a rich combination of information that can be used in the recognition of speech. However, due to wide variability in the lip movement involved in articulation, not all speech can be substantially improved by audio-visual integration. This paper describes a feature-fusion audio-visual speech recognition (AVSR) system that extracts lip geometry from the mouth region using a combination of skin color filter, border following and convex hull, and classification using a Hidden Markov Model. The comparison of the new approach with conventional audio-only system is made when operating under simulated ambient noise conditions that affect the spoken phrases. The experimental results demonstrate that, in the presence of audio noise, the audio-visual approach significantly improves speech recognition accuracy compared with audio-only approach
    corecore